Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 36453 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 5.3 MiB |
| Average record size in memory | 152.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 10 |
Unnamed: 0 is highly correlated with ID | High correlation |
ID is highly correlated with Unnamed: 0 | High correlation |
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERS | High correlation |
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUS | High correlation |
Unnamed: 0 is highly correlated with ID | High correlation |
ID is highly correlated with Unnamed: 0 | High correlation |
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERS | High correlation |
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUS | High correlation |
Unnamed: 0 is highly correlated with ID | High correlation |
ID is highly correlated with Unnamed: 0 | High correlation |
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERS | High correlation |
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUS | High correlation |
Unnamed: 0 is highly correlated with ID | High correlation |
ID is highly correlated with Unnamed: 0 | High correlation |
CODE_GENDER is highly correlated with FLAG_OWN_CAR and 1 other fields | High correlation |
FLAG_OWN_CAR is highly correlated with CODE_GENDER | High correlation |
NAME_INCOME_TYPE is highly correlated with OCCUPATION_TYPE and 2 other fields | High correlation |
NAME_FAMILY_STATUS is highly correlated with CNT_FAM_MEMBERS | High correlation |
OCCUPATION_TYPE is highly correlated with CODE_GENDER and 1 other fields | High correlation |
CNT_FAM_MEMBERS is highly correlated with NAME_FAMILY_STATUS | High correlation |
AGE is highly correlated with NAME_INCOME_TYPE | High correlation |
YEARS_EMPLOYED is highly correlated with NAME_INCOME_TYPE | High correlation |
Unnamed: 0 is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
ID has unique values | Unique |
OCCUPATION_TYPE has 1241 (3.4%) zeros | Zeros |
YEARS_EMPLOYED has 6135 (16.8%) zeros | Zeros |
Reproduction
| Analysis started | 2022-04-27 14:29:02.072889 |
|---|---|
| Analysis finished | 2022-04-27 14:29:45.009411 |
| Duration | 42.94 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
Unnamed: 0
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIFORMUNIQUE| Distinct | 36453 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18226 |
| Minimum | 0 |
|---|---|
| Maximum | 36452 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1822.6 |
| Q1 | 9113 |
| median | 18226 |
| Q3 | 27339 |
| 95-th percentile | 34629.4 |
| Maximum | 36452 |
| Range | 36452 |
| Interquartile range (IQR) | 18226 |
Descriptive statistics
| Standard deviation | 10523.21902 |
|---|---|
| Coefficient of variation (CV) | 0.5773740271 |
| Kurtosis | -1.2 |
| Mean | 18226 |
| Median Absolute Deviation (MAD) | 9113 |
| Skewness | 0 |
| Sum | 664392378 |
| Variance | 110738138.5 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 24305 | 1 | < 0.1% |
| 24299 | 1 | < 0.1% |
| 24300 | 1 | < 0.1% |
| 24301 | 1 | < 0.1% |
| 24302 | 1 | < 0.1% |
| 24303 | 1 | < 0.1% |
| 24304 | 1 | < 0.1% |
| 24306 | 1 | < 0.1% |
| 24297 | 1 | < 0.1% |
| Other values (36443) | 36443 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 36452 | 1 | |
| 36451 | 1 | |
| 36450 | 1 | |
| 36449 | 1 | |
| 36448 | 1 | |
| 36447 | 1 | |
| 36446 | 1 | |
| 36445 | 1 | |
| 36444 | 1 | |
| 36443 | 1 |
| Distinct | 36453 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5078227.661 |
| Minimum | 5008804 |
|---|---|
| Maximum | 5150487 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 5008804 |
|---|---|
| 5-th percentile | 5018456.2 |
| Q1 | 5042027 |
| median | 5074615 |
| Q3 | 5115397 |
| 95-th percentile | 5146024.4 |
| Maximum | 5150487 |
| Range | 141683 |
| Interquartile range (IQR) | 73370 |
Descriptive statistics
| Standard deviation | 41877.01797 |
|---|---|
| Coefficient of variation (CV) | 0.00824638452 |
| Kurtosis | -1.212733304 |
| Mean | 5078227.661 |
| Median Absolute Deviation (MAD) | 38094 |
| Skewness | 0.08619147174 |
| Sum | 1.851166329 × 1011 |
| Variance | 1753684634 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5008804 | 1 | < 0.1% |
| 5096994 | 1 | < 0.1% |
| 5096987 | 1 | < 0.1% |
| 5096988 | 1 | < 0.1% |
| 5096990 | 1 | < 0.1% |
| 5096991 | 1 | < 0.1% |
| 5096992 | 1 | < 0.1% |
| 5096993 | 1 | < 0.1% |
| 5096995 | 1 | < 0.1% |
| 5096982 | 1 | < 0.1% |
| Other values (36443) | 36443 |
| Value | Count | Frequency (%) |
| 5008804 | 1 | |
| 5008805 | 1 | |
| 5008806 | 1 | |
| 5008808 | 1 | |
| 5008809 | 1 | |
| 5008810 | 1 | |
| 5008811 | 1 | |
| 5008812 | 1 | |
| 5008813 | 1 | |
| 5008814 | 1 |
| Value | Count | Frequency (%) |
| 5150487 | 1 | |
| 5150485 | 1 | |
| 5150484 | 1 | |
| 5150483 | 1 | |
| 5150482 | 1 | |
| 5150481 | 1 | |
| 5150480 | 1 | |
| 5150479 | 1 | |
| 5150478 | 1 | |
| 5150477 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 24429 | |
| 1 | 12024 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 24429 | |
| 1 | 12024 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 22613 | |
| 1 | 13840 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 22613 | |
| 1 | 13840 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
FLAG_OWN_REALTY
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 24502 | |
| 0 | 11951 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 1 | 24502 | |
| 0 | 11951 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
AMT_INCOME_TOTAL
Real number (ℝ≥0)
| Distinct | 265 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 186684.6186 |
| Minimum | 27000 |
|---|---|
| Maximum | 1575000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 27000 |
|---|---|
| 5-th percentile | 76500 |
| Q1 | 121500 |
| median | 157500 |
| Q3 | 225000 |
| 95-th percentile | 360000 |
| Maximum | 1575000 |
| Range | 1548000 |
| Interquartile range (IQR) | 103500 |
Descriptive statistics
| Standard deviation | 101793.4761 |
|---|---|
| Coefficient of variation (CV) | 0.5452697544 |
| Kurtosis | 17.59701591 |
| Mean | 186684.6186 |
| Median Absolute Deviation (MAD) | 45000 |
| Skewness | 2.739006571 |
| Sum | 6805214402 |
| Variance | 1.036191178 × 1010 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 135000 | 4309 | 11.8% |
| 180000 | 3097 | 8.5% |
| 157500 | 3089 | 8.5% |
| 112500 | 2955 | 8.1% |
| 225000 | 2923 | 8.0% |
| 202500 | 2192 | 6.0% |
| 90000 | 1769 | 4.9% |
| 270000 | 1675 | 4.6% |
| 315000 | 1001 | 2.7% |
| 67500 | 873 | 2.4% |
| Other values (255) | 12570 |
| Value | Count | Frequency (%) |
| 27000 | 3 | < 0.1% |
| 29250 | 7 | |
| 30150 | 3 | < 0.1% |
| 31500 | 16 | |
| 31531.5 | 3 | < 0.1% |
| 31950 | 1 | < 0.1% |
| 32400 | 5 | < 0.1% |
| 33300 | 10 | |
| 33750 | 1 | < 0.1% |
| 36000 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 1575000 | 8 | < 0.1% |
| 1350000 | 6 | < 0.1% |
| 1125000 | 3 | < 0.1% |
| 990000 | 4 | < 0.1% |
| 945000 | 4 | < 0.1% |
| 900000 | 39 | |
| 810000 | 15 | < 0.1% |
| 787500 | 5 | < 0.1% |
| 765000 | 9 | < 0.1% |
| 742500 | 5 | < 0.1% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 4 | |
|---|---|
| 0 | |
| 1 | |
| 2 | |
| 3 | 11 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 4 |
|---|---|
| 2nd row | 4 |
| 3rd row | 4 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 18815 | |
| 0 | 8490 | |
| 1 | 6152 | 16.9% |
| 2 | 2985 | 8.2% |
| 3 | 11 | < 0.1% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 4 | 18815 | |
| 0 | 8490 | |
| 1 | 6152 | 16.9% |
| 2 | 2985 | 8.2% |
| 3 | 11 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
NAME_EDUCATION_TYPE
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 4 | |
|---|---|
| 1 | |
| 2 | 1410 |
| 3 | 374 |
| 0 | 32 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 4 |
| 4th row | 4 |
| 5th row | 4 |
Common Values
| Value | Count | Frequency (%) |
| 4 | 24773 | |
| 1 | 9864 | 27.1% |
| 2 | 1410 | 3.9% |
| 3 | 374 | 1.0% |
| 0 | 32 | 0.1% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 4 | 24773 | |
| 1 | 9864 | 27.1% |
| 2 | 1410 | 3.9% |
| 3 | 374 | 1.0% |
| 0 | 32 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 1 | |
|---|---|
| 3 | |
| 0 | |
| 2 | 2100 |
| 4 | 1532 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 1 |
| 4th row | 3 |
| 5th row | 3 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 25048 | |
| 3 | 4828 | 13.2% |
| 0 | 2945 | 8.1% |
| 2 | 2100 | 5.8% |
| 4 | 1532 | 4.2% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 1 | 25048 | |
| 3 | 4828 | 13.2% |
| 0 | 2945 | 8.1% |
| 2 | 2100 | 5.8% |
| 4 | 1532 | 4.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
NAME_HOUSING_TYPE
Real number (ℝ≥0)
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.282912243 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 168 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9517223501 |
|---|---|
| Coefficient of variation (CV) | 0.7418452472 |
| Kurtosis | 9.494862667 |
| Mean | 1.282912243 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.29061941 |
| Sum | 46766 |
| Variance | 0.9057754317 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 32544 | |
| 5 | 1776 | 4.9% |
| 2 | 1128 | 3.1% |
| 4 | 575 | 1.6% |
| 3 | 262 | 0.7% |
| 0 | 168 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 168 | 0.5% |
| 1 | 32544 | |
| 2 | 1128 | 3.1% |
| 3 | 262 | 0.7% |
| 4 | 575 | 1.6% |
| 5 | 1776 | 4.9% |
| Value | Count | Frequency (%) |
| 5 | 1776 | 4.9% |
| 4 | 575 | 1.6% |
| 3 | 262 | 0.7% |
| 2 | 1128 | 3.1% |
| 1 | 32544 | |
| 0 | 168 | 0.5% |
FLAG_WORK_PHONE
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 28232 | |
| 1 | 8221 | 22.6% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 28232 | |
| 1 | 8221 | 22.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
FLAG_PHONE
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 25706 | |
| 1 | 10747 |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 25706 | |
| 1 | 10747 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
FLAG_EMAIL
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 0 | |
|---|---|
| 1 | 3271 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 33182 | |
| 1 | 3271 | 9.0% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 33182 | |
| 1 | 3271 | 9.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 19 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.170712973 |
| Minimum | 0 |
|---|---|
| Maximum | 18 |
| Zeros | 1241 |
| Zeros (%) | 3.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 6 |
| median | 10 |
| Q3 | 12 |
| 95-th percentile | 15 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.292705391 |
|---|---|
| Coefficient of variation (CV) | 0.4680885122 |
| Kurtosis | -0.7084961249 |
| Mean | 9.170712973 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.4382491506 |
| Sum | 334300 |
| Variance | 18.42731958 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 12 | 11323 | |
| 8 | 6211 | |
| 3 | 3591 | 9.9% |
| 15 | 3485 | 9.6% |
| 10 | 3012 | 8.3% |
| 4 | 2135 | 5.9% |
| 6 | 1383 | 3.8% |
| 0 | 1241 | 3.4% |
| 11 | 1207 | 3.3% |
| 2 | 655 | 1.8% |
| Other values (9) | 2210 | 6.1% |
| Value | Count | Frequency (%) |
| 0 | 1241 | 3.4% |
| 1 | 551 | 1.5% |
| 2 | 655 | 1.8% |
| 3 | 3591 | |
| 4 | 2135 | 5.9% |
| 5 | 85 | 0.2% |
| 6 | 1383 | 3.8% |
| 7 | 60 | 0.2% |
| 8 | 6211 | |
| 9 | 175 | 0.5% |
| Value | Count | Frequency (%) |
| 18 | 173 | 0.5% |
| 17 | 592 | 1.6% |
| 16 | 151 | 0.4% |
| 15 | 3485 | 9.6% |
| 14 | 79 | 0.2% |
| 13 | 344 | 0.9% |
| 12 | 11323 | |
| 11 | 1207 | 3.3% |
| 10 | 3012 | 8.3% |
| 9 | 175 | 0.5% |
CNT_FAM_MEMBERS
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.196911091 |
| Minimum | 1 |
|---|---|
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 9 |
| Range | 8 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8994885611 |
|---|---|
| Coefficient of variation (CV) | 0.4094333015 |
| Kurtosis | 1.229319259 |
| Mean | 2.196911091 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.907514612 |
| Sum | 80084 |
| Variance | 0.8090796716 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 19463 | |
| 1 | 6987 | 19.2% |
| 3 | 6421 | 17.6% |
| 4 | 3106 | 8.5% |
| 5 | 397 | 1.1% |
| 6 | 58 | 0.2% |
| 7 | 19 | 0.1% |
| 9 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 6987 | 19.2% |
| 2 | 19463 | |
| 3 | 6421 | 17.6% |
| 4 | 3106 | 8.5% |
| 5 | 397 | 1.1% |
| 6 | 58 | 0.2% |
| 7 | 19 | 0.1% |
| 9 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 2 | < 0.1% |
| 7 | 19 | 0.1% |
| 6 | 58 | 0.2% |
| 5 | 397 | 1.1% |
| 4 | 3106 | 8.5% |
| 3 | 6421 | 17.6% |
| 2 | 19463 | |
| 1 | 6987 | 19.2% |
| Distinct | 7182 |
|---|---|
| Distinct (%) | 19.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43.73850772 |
| Minimum | 20.50418558 |
|---|---|
| Maximum | 68.86383704 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 20.50418558 |
|---|---|
| 5-th percentile | 27.03409379 |
| Q1 | 34.11979712 |
| median | 42.61004675 |
| Q3 | 53.2194364 |
| 95-th percentile | 63.02388139 |
| Maximum | 68.86383704 |
| Range | 48.35965146 |
| Interquartile range (IQR) | 19.09963928 |
Descriptive statistics
| Standard deviation | 11.501045 |
|---|---|
| Coefficient of variation (CV) | 0.2629501005 |
| Kurtosis | -1.045705537 |
| Mean | 43.73850772 |
| Median Absolute Deviation (MAD) | 9.377331499 |
| Skewness | 0.18427999 |
| Sum | 1594399.822 |
| Variance | 132.2740361 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 42.48957884 | 54 | 0.1% |
| 34.70570922 | 54 | 0.1% |
| 46.25967679 | 38 | 0.1% |
| 40.15688207 | 37 | 0.1% |
| 45.90922469 | 32 | 0.1% |
| 41.45191209 | 32 | 0.1% |
| 42.91669233 | 32 | 0.1% |
| 37.75026181 | 30 | 0.1% |
| 38.70305345 | 30 | 0.1% |
| 39.4258609 | 29 | 0.1% |
| Other values (7172) | 36085 |
| Value | Count | Frequency (%) |
| 20.50418558 | 1 | < 0.1% |
| 21.09557349 | 1 | < 0.1% |
| 21.14485581 | 2 | |
| 21.23794465 | 4 | |
| 21.79100187 | 2 | |
| 21.84849792 | 1 | < 0.1% |
| 22.01551024 | 4 | |
| 22.05110303 | 1 | < 0.1% |
| 22.05657885 | 2 | |
| 22.08669583 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 68.86383704 | 2 | |
| 68.83098216 | 3 | |
| 68.71872797 | 1 | < 0.1% |
| 68.68861099 | 1 | < 0.1% |
| 68.47505424 | 2 | |
| 68.36553796 | 2 | |
| 68.34637262 | 1 | < 0.1% |
| 68.2998282 | 3 | |
| 68.2614975 | 4 | |
| 68.21221517 | 3 |
| Distinct | 3639 |
|---|---|
| Distinct (%) | 10.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.02440509 |
| Minimum | 0 |
|---|---|
| Maximum | 43.0207328 |
| Zeros | 6135 |
| Zeros (%) | 16.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1.117066059 |
| median | 4.249231675 |
| Q3 | 8.632620793 |
| 95-th percentile | 19.72661999 |
| Maximum | 43.0207328 |
| Range | 43.0207328 |
| Interquartile range (IQR) | 7.515554734 |
Descriptive statistics
| Standard deviation | 6.480410609 |
|---|---|
| Coefficient of variation (CV) | 1.075693037 |
| Kurtosis | 3.846197692 |
| Mean | 6.02440509 |
| Median Absolute Deviation (MAD) | 3.586658179 |
| Skewness | 1.758617789 |
| Sum | 219607.6388 |
| Variance | 41.99572166 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 6135 | 16.8% |
| 1.09790071 | 78 | 0.2% |
| 4.213638884 | 64 | 0.2% |
| 0.5475814014 | 63 | 0.2% |
| 5.714011924 | 61 | 0.2% |
| 4.594207958 | 61 | 0.2% |
| 6.929642635 | 56 | 0.2% |
| 1.259437223 | 54 | 0.1% |
| 3.175972128 | 53 | 0.1% |
| 4.547663539 | 52 | 0.1% |
| Other values (3629) | 29776 |
| Value | Count | Frequency (%) |
| 0 | 6135 | |
| 0.04654441912 | 3 | < 0.1% |
| 0.1177300013 | 1 | < 0.1% |
| 0.1779639555 | 2 | < 0.1% |
| 0.1807018625 | 1 | < 0.1% |
| 0.1916534905 | 4 | < 0.1% |
| 0.1943913975 | 1 | < 0.1% |
| 0.1998672115 | 17 | < 0.1% |
| 0.2135567465 | 1 | < 0.1% |
| 0.2162946536 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 43.0207328 | 1 | < 0.1% |
| 42.87836164 | 4 | < 0.1% |
| 41.69011 | 1 | < 0.1% |
| 41.26573441 | 3 | < 0.1% |
| 41.17264557 | 16 | |
| 40.75922161 | 6 | < 0.1% |
| 40.54840277 | 8 | |
| 40.45257603 | 2 | < 0.1% |
| 39.79821625 | 4 | < 0.1% |
| 39.62572811 | 6 | < 0.1% |
STATUS
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 284.9 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 32164 | |
| 1 | 4289 | 11.8% |
Length
Pie chart
| Value | Count | Frequency (%) |
| 0 | 32164 | |
| 1 | 4289 | 11.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
MONTHS_BALANCE
Real number (ℝ≥0)
| Distinct | 61 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.16396456 |
| Minimum | 0 |
|---|---|
| Maximum | 60 |
| Zeros | 315 |
| Zeros (%) | 0.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 284.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 12 |
| median | 24 |
| Q3 | 39 |
| 95-th percentile | 55 |
| Maximum | 60 |
| Range | 60 |
| Interquartile range (IQR) | 27 |
Descriptive statistics
| Standard deviation | 16.50100418 |
|---|---|
| Coefficient of variation (CV) | 0.6306767519 |
| Kurtosis | -1.037660108 |
| Mean | 26.16396456 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 0.2863865912 |
| Sum | 953755 |
| Variance | 272.283139 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 888 | 2.4% |
| 11 | 828 | 2.3% |
| 6 | 824 | 2.3% |
| 8 | 820 | 2.2% |
| 5 | 815 | 2.2% |
| 17 | 807 | 2.2% |
| 3 | 800 | 2.2% |
| 10 | 798 | 2.2% |
| 16 | 785 | 2.2% |
| 15 | 774 | 2.1% |
| Other values (51) | 28314 |
| Value | Count | Frequency (%) |
| 0 | 315 | 0.9% |
| 1 | 551 | |
| 2 | 643 | |
| 3 | 800 | |
| 4 | 765 | |
| 5 | 815 | |
| 6 | 824 | |
| 7 | 888 | |
| 8 | 820 | |
| 9 | 770 |
| Value | Count | Frequency (%) |
| 60 | 321 | |
| 59 | 307 | |
| 58 | 332 | |
| 57 | 304 | |
| 56 | 345 | |
| 55 | 368 | |
| 54 | 358 | |
| 53 | 377 | |
| 52 | 463 | |
| 51 | 476 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| Unnamed: 0 | ID | CODE_GENDER | FLAG_OWN_CAR | FLAG_OWN_REALTY | AMT_INCOME_TOTAL | NAME_INCOME_TYPE | NAME_EDUCATION_TYPE | NAME_FAMILY_STATUS | NAME_HOUSING_TYPE | FLAG_WORK_PHONE | FLAG_PHONE | FLAG_EMAIL | OCCUPATION_TYPE | CNT_FAM_MEMBERS | AGE | YEARS_EMPLOYED | STATUS | MONTHS_BALANCE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 5008804 | 1 | 1 | 1 | 427500.0 | 4 | 1 | 0 | 4 | 1 | 0 | 0 | 12 | 2 | 32.868574 | 12.435574 | 1 | 15 |
| 1 | 1 | 5008805 | 1 | 1 | 1 | 427500.0 | 4 | 1 | 0 | 4 | 1 | 0 | 0 | 12 | 2 | 32.868574 | 12.435574 | 1 | 14 |
| 2 | 2 | 5008806 | 1 | 1 | 1 | 112500.0 | 4 | 4 | 1 | 1 | 0 | 0 | 0 | 17 | 2 | 58.793815 | 3.104787 | 0 | 29 |
| 3 | 3 | 5008808 | 0 | 0 | 1 | 270000.0 | 0 | 4 | 3 | 1 | 0 | 1 | 1 | 15 | 1 | 52.321403 | 8.353354 | 0 | 4 |
| 4 | 4 | 5008809 | 0 | 0 | 1 | 270000.0 | 0 | 4 | 3 | 1 | 0 | 1 | 1 | 15 | 1 | 52.321403 | 8.353354 | 0 | 26 |
| 5 | 5 | 5008810 | 0 | 0 | 1 | 270000.0 | 0 | 4 | 3 | 1 | 0 | 1 | 1 | 15 | 1 | 52.321403 | 8.353354 | 0 | 26 |
| 6 | 6 | 5008811 | 0 | 0 | 1 | 270000.0 | 0 | 4 | 3 | 1 | 0 | 1 | 1 | 15 | 1 | 52.321403 | 8.353354 | 0 | 38 |
| 7 | 7 | 5008812 | 0 | 0 | 1 | 283500.0 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 12 | 1 | 61.504343 | 0.000000 | 0 | 20 |
| 8 | 8 | 5008813 | 0 | 0 | 1 | 283500.0 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 12 | 1 | 61.504343 | 0.000000 | 0 | 16 |
| 9 | 9 | 5008814 | 0 | 0 | 1 | 283500.0 | 1 | 1 | 2 | 1 | 0 | 0 | 0 | 12 | 1 | 61.504343 | 0.000000 | 0 | 17 |
Last rows
| Unnamed: 0 | ID | CODE_GENDER | FLAG_OWN_CAR | FLAG_OWN_REALTY | AMT_INCOME_TOTAL | NAME_INCOME_TYPE | NAME_EDUCATION_TYPE | NAME_FAMILY_STATUS | NAME_HOUSING_TYPE | FLAG_WORK_PHONE | FLAG_PHONE | FLAG_EMAIL | OCCUPATION_TYPE | CNT_FAM_MEMBERS | AGE | YEARS_EMPLOYED | STATUS | MONTHS_BALANCE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36443 | 36443 | 5149145 | 1 | 1 | 1 | 247500.0 | 4 | 4 | 1 | 1 | 1 | 0 | 0 | 8 | 2 | 29.985558 | 9.793493 | 1 | 25 |
| 36444 | 36444 | 5149158 | 1 | 1 | 1 | 247500.0 | 4 | 4 | 1 | 1 | 1 | 0 | 0 | 8 | 2 | 29.985558 | 9.793493 | 1 | 28 |
| 36445 | 36445 | 5149190 | 1 | 1 | 0 | 450000.0 | 4 | 1 | 1 | 1 | 0 | 1 | 1 | 3 | 3 | 26.960170 | 1.374429 | 1 | 11 |
| 36446 | 36446 | 5149729 | 1 | 1 | 1 | 90000.0 | 4 | 4 | 1 | 1 | 0 | 0 | 0 | 12 | 2 | 52.296762 | 4.711938 | 1 | 21 |
| 36447 | 36447 | 5149775 | 0 | 1 | 1 | 130500.0 | 4 | 4 | 1 | 1 | 0 | 1 | 0 | 8 | 2 | 44.181605 | 25.711685 | 1 | 19 |
| 36448 | 36448 | 5149828 | 1 | 1 | 1 | 315000.0 | 4 | 4 | 1 | 1 | 0 | 0 | 0 | 10 | 2 | 47.497211 | 6.625735 | 1 | 11 |
| 36449 | 36449 | 5149834 | 0 | 0 | 1 | 157500.0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 11 | 2 | 33.914454 | 3.627727 | 1 | 23 |
| 36450 | 36450 | 5149838 | 0 | 0 | 1 | 157500.0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 11 | 2 | 33.914454 | 3.627727 | 1 | 32 |
| 36451 | 36451 | 5150049 | 0 | 0 | 1 | 283500.0 | 4 | 4 | 1 | 1 | 0 | 0 | 0 | 15 | 2 | 49.167334 | 1.793329 | 1 | 9 |
| 36452 | 36452 | 5150337 | 1 | 0 | 1 | 112500.0 | 4 | 4 | 3 | 4 | 0 | 0 | 0 | 8 | 1 | 25.155890 | 3.266323 | 1 | 13 |